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Regulatory gene networks contain generic modules, like those involving feed- 
back loops, which are essential for the regulation of many biological func- 
^ ' tions [18j. We consider a class of self-regulated genes which are the building 
l/^ . blocks of many regulatory gene networks, and study the steady-state distri- 

Q \ bution of the associated Gillespie algorithm by providing efficient numerical 
' algorithms. We also study a regulatory gene network of interest in gene ther- 
apy, using mean-field models with time delays. Convergence of the related 
time-nonhomogenous Markov chain is established for a class of linear catalytic 
networks with feedback loops. 

Q>^ ■ Keywords: Gillespie algorithm, gene network, self promoter, quasi-equilibrium, dimeriza- 

rn . tion, mean field, time delay. 
(N 
O 



1 Introduction 



Modeling of the regulation of all genes in a given cell is a tantalizing problem in biology and 
medicine (see, e.g. |18j). Recent developments allow rapid experimental determination 
^ ' of the expression of nearly all genes in a given biological setting, to an extent that in 
depth analysis and proper mathematical understanding of these vast arrays of data has 
become limiting. Qualitative models of regulatory networks, where particular genes code 
for proteins that activate or repress other genes, are being assembled, but models taking 
the stochastic and quantitative nature of gene regulation remain scarce, and they often 
rely on assumptions or simplifications that rest untested experimentally. Thus, it would 
be useful to build validated mathematical models of particular regulatory modules, as a 
first step towards constructing models of genome-wide gene expression. 

Here, we consider a class of self-regulated genes, as depicted in Figure [H This auto- 
regulated module is a very common building block of many gene networks, as it may 
form the basis of stochastic gene switches that contribute to biological decisions such 
as cell differentiation, and has been studied extensively in the literature in some special 
settings, as in [31], [21] or [23]. In a previous work, |llj, we provided the exact steady-state 
distribution of the stochastic expression level of the autoregulated gene using the Gillespie 
algorithm in a general setting. We will present a direct version of the method and study 
more deeply this stationary distribution by providing efficient numerical algorithms. We 
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Figure 1: The self-regulated gene 
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will also consider a synthetic regulatory network acting as a genetic switch that was studied 
in living cells |22j . 



A self- regulated gene 

The system is composed of a promoter and a gene, as schematized in Figure [TJ As 
stated previously, one source of molecular noise is the random nature of the states taken 
by the promoter (on/off). Figure [1] shows protein monomers produced by the RNA poly- 
merase during the transcription and translation processes. Protein monomers react quickly 
to form dimers: we assume a quasi-equilibrium where fast reactions equilibrate instanta- 
neously. For a global amount of n proteins, the proportion of dimers at quasi-equilibrium 
is a well defined function of n. Dimers can bind to some sites near the promoter, and 
therefore enhance transcription, corresponding to a positive feedback loop. These binding 
events can be assumed to be fast with respect to events like protein formation. They 
are however included in some chain of events which ends with a state where the right 
positioning of the RNA polymerase is possible. This will correspond to the on state Oi. 
When these conditions are not satisfied, the promoter is off Oq. The rates of transitions 
between these two states are functions of the proportion of dimers, and therefore of n 
when the cell contains n proteins. These random events are usually modeled by supposing 
that the probability that the promoter switches from the off to on state in a small time 
interval of length /i ~ is of order g(n)h for n proteins, where the function g can be 
chosen according to the specificity of the setting. To be as general as possible, and to 
eventually allow negative feedback loops, we also assume that the probability of transition 
of the reverse reaction is given by some function K{n). Basal activity is introduced by sup- 
posing that g(0) is positive, so that the required conditions for an eventual transcription 
event can be realized without protein dimers. The remaining involved chemical reactions 
are essentially protein monomers production and degradation, which are summarized in 
Figured! Transcription is stopped when the promoter is off, so that we assume that the 
probability ^^h that a protein is created during a small time interval of length h vanishes, 
with /xq = 0. When the promoter is on, transcription is possible, and the probability that 
a transcription event occurs is of order /xi/i. Degradation of protein dimers is summarized 
by the rate i^(n), for some function z^, which is usually linear as a function of n. The time 
evolution of the state of this self regulated gene is described by a pair of time continuous 
stochastic process N(t) and Y(t), where N(t) gives the number of proteins present in the 
cell at time t and where Y(t) takes the values and 1 corresponding to the off and on 
states of the promoter. The usual way of simulating N(t) and Y(t) proceeds by running 
the Gillespie algorithm (see e.g. [15j, and [7j). The mean steady state expression level is 
thus obtained through Monte-Carlo experiments. 



A regulatory network for efficient control of transgene expression 



A more elaborate gene network consists of three genes. A first gene encodes a transcrip- 
tional repressor. Because this gene is expressed from an unregulated promoter, it mediates 
a stable number of repressor. This repressor binds to and inhibits the promoters of the 
two other genes, coding for a transactivator protein and for a quantifiable or a therapeutic 
protein, respectively (Figure [2]A) . The activity of the repressor is inhibited by doxycycline, 
a small antibiotic molecule that acts as a ligand of the repressor and thereby controls its 
activity. Addition of the antibiotic will inhibit the repressor and relieve repression, allow- 
ing low levels of expression of the regulated genes and synthesis of some transactivator 
protein. This, in turn, allows further activation of the two regulated genes, in a positive 
feedback loop (Figure [2(3). When introduced in mammalian cells, this behaves as a signal 
amplifier and as a potent genetic switch, where the expression of a therapeutic gene can be 
controlled to vary from almost undetectable to very high levels in response to the addition 
of the antibiotic to the cells ( [22j , and |JJL] ) . 

Results 

Section [2] considers the Gillespie algorithm for simulating the time evolution of the number 
of proteins N{t) and of the state of of the related promoter Y(t), by focusing on the 
associated steady state distribution vr. In a previous work [llj, we gave an explicit formula 
for the steady-state associated to self-regulated genes, based on the embedded jump chain 
of the time continuous Markov process. Here we introduce a direct version dealing with 
the time continuous process. For concrete computation, we have to use a bounded state 
space with a total number of proteins that can not exceed a fixed but arbitrary integer 
A. ALGORITHM I gives an efficient way of computing vr and a useful tightness argument 
to show that the sequence of steady-state distributions measures indexed by A converges 
as A — > oo to the unique invariant distribution of the process defined on the unbounded 
state space N x {0, 1}. We also provide information on the variance of the gene product 
at steady state using generating functions and differential equations. 

SectionOproposes a mean field model with time delays, generalizing a model considered 
recently in this setting by p2], by including stochastic signals related to promoters. The 
feedback rates g{N{t)) and «:(iV(t)) are replaced by g(E.{N{t - 9))) and K(E{N{t - 6))). 
In [11], we studied the regulatory gene network in living cells; the obtained experimental 
results were in good concordance with the model's predictions. The related functions 
E{t) = E(iV(t)) and G{t) = PiY{t) = 1) sastisfy the time delayed differential system 

— = f,G{t) - uEit), — = g{E{t - e)){l - Git)) - K{E{t - e))G{t), 

which can be deduced from the chemical master equation, see Section [5j As it is well 
known, this kind of differential systems can possess oscillating or periodic solutions, see 
e.g. [5]. We show that there is a globally asymptotically stable equilibrium point when 
g is such that g{n)/n is decreasing as function of n and K{n) = k. We next provide 
ALGORITHM II for computing the steady state variance. Section U] deals with two time 
scales stochastic simulations and processes evolving at quasi-equilibrium. We also consider 
a generic dimerization process which occurs in most biochemical reaction networks, and 
provide an efficient ALGORITHM III for computing the first two moments of the related 
steady state distribution, which are then used when dealing with systems evolving at 
quasi-equilibrium. Finally, Section [5] focus on the regulatory network; we model extrinsic 
and intrinsic noise using a mean-field model, which permits to study the fiuctuations of the 
variance of the number of therapeutic proteins as function of the number of doxycycline 
molecules. 
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Figure 2: The regulatory network 

2 Mathematical models related to the self-regulated gene 

In what fohows, we consider the time continuous Markov chain known as the Gillespie 
algorithm for simulating the self regulated gene with arbitraries feedback mechanisms, and 
give precise formulas for the related steady state distribution vr. We shall see in Section 
15.41 a semi-stochastic or mean field model for the therapeutic network. The related steady 
state distribution is obtained as the product of steady state distributions of sub modules 
corresponding to self regulated genes. A complete understanding of basic modules like the 
self regulated gene is thus fundamental for understanding the global network, see e.g. [18] 
and Section [5.41 For more details on the model, see [llj . 

The module is composed of a promoter and a gene. Its time evolution is given by the 
following set of chemical reactions: 

r'±lii>, (DJ^v, 1 = 0,1, 

represents degradation of gene product when they are n molecules, here proteins (V), and 
protein production, where / = means that the promoter is off: no transcription factor (a 
complex composed of gene product) is bound to some operator sites near the promoter, 
so that the RNA polymerase can't bind well in the neighborhood of the basal promoter. 
We assume here that the transcription rate hq is such that /Uq ~ 0. When 1 = 1, meaning 
that the promoter is on, transcription occurs at a rate /^i = fi. The fluctuations of the 
state of the promoter are described by the following reaction 

Co ^1' 

K(n) 



(n-lA) 



(n,l) 



(n+1,1) 



9{n) 



K{n) 



u{n) 



(n-1,0) 



(n,0) 



(n+1,0) 



Figure 3: Visualization of the state space as a strip. The possibles transitions are repre- 
sented by the arrows with corresponding rates. 



where Oi, I G {0, 1}, indicates the state of the promoter. The transitions from the on to 

off states Ci-^Oo occur at rate K,{n), and the reverse reactions at rate g{n). Here g and 
K are two functions of the number of proteins modeling positive and negative feedback 
loops. 

Basal activity is introduced at the level of the reaction 00^^01, by supposing that 
(7 > 0. The Gillespie algorithm for simulating the above chemical reactions is a bivariate 
Markov process r]{t) = {N{t),Y{t)), with N{t) ^ A and Y{t) = 0, 1, where N{t) denotes 
the number of proteins present in the cell at time t, and Y{t) represents the state of the 
operator. The time evolution of Y{t) is coupled to that of N(t) to model auto-regulation, 
using functions g and k. For small time interval {t,t + h), /i ~ 0, the probability that the 
operator switches from the off to the on state is of order g{N{t))h. 

Let plit) = PiN{t) = n,Y{t) = 0) and pl,{t) = P(iV(t) = n,Y{t) = 1) give the 
probability of having n proteins at time t when the states of the promoter are Oq and Oi , 
respectively. We assume here that ^ n ^ A for some fixed but arbitrary integer A. The 
related Gillespie algorithm is given as a time-continuous Markov chain r](t) = {N(t), Y{t)) 
(see e.g. [Hj), where N{t) G {0, 1, • • • , A} and Y{t) G {0, 1}, with transition rates given 
by 

P{{n, y), (n + 1, y)) = fiy, P{{n, y), {n - 1, y)) = v{n), 
P{{n, y), {n, 1 — y)) = K{n) when y = 1, and , 
P((n, y), (n, I - y)) = g{n) when y = 0. 
The chemical master equation associated to the reaction scheme is then given by 

^ = M-i{t) - Kit)) + u{n + - Hn)pUt) (2.1) 

+ (-l)^(K(n)pi(t)-<7(n)pO(t)), 

where s G {0, 1}, see e.g. [23] . 

The steady-state distribution vr associated with (12. ID is obtained by letting t ^ oo: vr 
is defined as 

7r„(0) = lim p°(t) and 7r.„(l) = lim pi{t), 

t—*00 t^QO 

and solves the linear system obtained from (j2.ip by imposing dpf^/dt = 0: 



= /Xs(7r„_i(s) - 7r„(s)) + i^(n + l)7r„+i(s) - z^(n)7r„(s) 



+ {-iy{K{n)TTn{l) - g{n)TTnm, S = 0, 1. 

7r,i(0) is the probability to find n proteins and that the promoter is off; 7r„(l) is defined 
similarly but for the on state. The probability of observing n proteins at equilibrium is 
just 7r„(0) + 7r„(l). 

In what follows, we derive the steady state distribution vr. This probability measure 
is used in the mean-field delayed model of Section [3] and in Section for the study of the 
network. 



2.1 Computing the steady-state 



In this Section, we assume that ^uq = 0, and that = /i, where an explicit formula for 
the steady-state is available. The degradation propensity function z^(n) and the feedbacks 
«;(n) and g{n) are arbitrary positive functions. 

The method of generating functions can be used in some particuliar special cases 
to compute the invariant measure, see [31] for the simple case whitout feedback with 
1/(12) = u ■ n, K{n) = K and g{n) = g, or [2T] for the case with linear negative feedback 
iy{n) = ly ■ n, K{n) = k ■ n and g{n) = g. Although this method provides a powerful tool 
for analytic description, the method of generating functions is very particular in the sense 
that a little change in the form of one of the feedback propensity function can induce 
major changes in the generating function, and for each particular propensity function one 
has to derive the whole set of equations anew. Furthermore, an explicit form for the 
generating function can only be found when the feedback propensity functions are simple, 
either constant or linear in the protein numbers. In practice, the feedback propensity 
functions are related to the number of sites in the promoter on which the proteins bind, 
either directly in monomer form or in more complicated bound forms like dimers or higher 
order polymers, see [TO] . 

In [11], we presented a general formula for the steady-state for arbitrary degradation 
and feedback propensity functions. The method relies on the asymptotic behaviour of the 
jump matrix of the embedded discrete jump chain. We provide here a direct version of 
this method allowing to compute the invariant distribution of the time continuous Markov 
process directly. We recall that we consider a bounded total number of protein N{t) < A, 
a restriction that is biologically meaningful due to the finite volume of a cell but that is 
mainly supposed for technical reasons since the formula is recursive and we have to find 
a starting point (7rA(0), 7rA(l)) to begin with. However, the condition is not restrictive 
and we show in Theorem [2] that the sequence of steady-state distributions indexed by 
the boundary A converges weakly to the unique invariant distribution of the unbounded 
process on N x {0, 1}. 

Let us define the transfer matrices 



Ctr; 



j/(n + 1) 



g{n)+u{n) 
K{n) 



< n < A, 



Ki) 



and the vector w\ = {k{A), g{A) + J^(A)). 



9(0) ^ 
«(0) 1 
9(0) ^ 



Theorem 1 For < n < A, the invariant distribution 

7r„ = (^„(0),^„(1)) = hm {P{{N{t),Y{t)) = {n,0)), P{{N{t),Y{t)) = (n, 1))) 

of the time- continuous Markov process {ri{t)}t>o on the strip {0, 1, . . . , A} x {0, 1} is given 
by 



TTn 



WA«A-iaA-2 

Za 



■ OLr. 



for < n < A, and tta 



WA 
Za' 



with the normalization constant 



A-i 



Za = wa- (1, 1)^ + ^ w^AaA-iaA-2 ■■■aj ■ (1, 1) 

j=0 



Proof : At equilibrium, equations 12.11 reads 



=7roi?o + vTiDi, 

=TTn-lU + TTnRn + T^n+lDn+l, < n < A, 
=TTA-lU + TTaRa- 



with the 2x2 matrices U 



Rn 








v{n) 
v{n) 



and 



-(c/(n) + 5f(n) 

K{n) —{K{n)+v{n)+^) 



for 1 < n < A — 1, and the boundaries Rq 



Da 



v{K) 
i/(A) 



and Ra 



-5(0) 5(0) 
k(0) -(k(0)+^) 



-{g{k) + v{K)) g{K) 

k(A) -(k(A) + z.(A)) 



Simple linear algebra shows that the above defined matrices a„, < n < A, satisfy the 
relation 7r„ = nn+io^n- Indeed one only has to check that for < n < A, the 2x2 matrices 
Un-iU + Rn are invertible and that the matrices a„ solve the matrix continuous fraction 

ao = — DiRq ^, 

an = - Dn+i{an-iU + RnY^ , < n < A. 

□ 



The formula given in Theorem [T] must be used with care numerically since, when A 
is large, both the numerator and denominator rapidly diverge. It can be improved with 
the following normalization algorithm, that is exactly the same as the one used for the 
embedded jump chain in [11]. Let R?.g := {w = {wi,W2),wi,W2 € lR>o} with the 1-norm 
\\w\\ := w ■ (1, 1)-^. 

ALGORITHM I 



(STEP 1): Define for n = A - 1 to as 



VA 



WA 



and Vn '■- 



Vn+l^n 



IWaW \\Vn+ian\\ 

(STEP 2): Given the Vn, define vo = vq and, for n = 1 to A, set 

||w„a„_i|| • ||-Un-ian--2|| • • • ll-wiaoll ' 
(STEP 3): Compute the steady-state distribution as 

A 

V. 



TTn = 7^, where Va ■='y^Vi- 1. 
Va ^ 



1=0 



It immediately results from their definition that the Vn and Vn satisfy \\vn\\ = 1, 

^ 't;AaA_iaA_2 ■■■an 

II^'aOa-iII • ||^'A-iaA-2|| • • • ll^^n+ianll • ll'Wnan-ill ' ' ' Il-Wiaoll ' 

the denominator of the above expression is independent of n and v\ is proportional to w\. 
Hence Vn is proportional to the invariant measure 7r„, and (STEP 3) of the algorithm 
effectively compute the actual steady-state distribution. 

Proposition [1] below provides conditions under which the normalization constant V\ 
remains bounded as A is large. The function z^(n) gives the monomer degradation rates 
for n proteins, and is assumed to be increasing with i/(0) = 0, and strictly positive for 
n > 1. Usually, i^(n) is taken to be a constant times n, here we assume the less restrictive 
condition that inf v[n)/n is stricly positive to allow situations where for example proteins 

n>l 

that are present as chemical complexes (dimer, trimer,...) can not be degradated, or 
situations where v{n)/n ^ cx) as n — > (X). 



Lemma 1 If inf v{n)/n is stricly positive, there exists a constant k > depending only 



n>l 



on /X, u, K, g (and not on K) such that for all n > 1, Hvnan-iH > nk. 

Proof: Each Vj lies in the line segment 5 C between the points (0,1) and (1,0) and 
depends on A. To break this dependence, we prove the results for an arbitrary vector 
v = {t,l-t)eS,te [0, 1]. 

van = ^(*,t + (1 - t)) = ^{*, 1), 

with * > for all n. Hence, uniformly in S, 

||fa„|| zv(n) ^ vin) 
" " > > inf =: /c > 0. 



n nfi n>i nfi 

□ 

With Lemma [1] we can give bounds uniformly in A: 

Proposition 1 Assume that inf v[n)/n is stricly positive. There exists M > 0, depending 

n>l 

only on fj,, u, k, g (and not on A), such that 

A 



1<V^ = Y,\K\\ <M. 



n=0 

Proof : Notice that ||?;o|| = ||^o|| = 1- The 1-norm of Vn is 

n n \\VJ\ 1 



||i'nan-i|| • • • lI'Viaoll ||fnan-i|| • ' ' \\viao\ 

and with Lemma [1] 



^A = i + EiM<i+EV = ^'"^-^- 



n! 

n=l n=l 



The first aim of lemma [T] is to show that the preceding algorithm is efficient. But 
this lemma can also be used to demonstrate that the steady-state distribution of the 
continuous-time process ry(t) converges when A ^ oo. Moreover, we can show that this 



limiting distribution is the invariant distribution of the process on the unbounded strip. 
It is necessary to adapt our notations in order to show the dependency in A. Henceforth, 
we win write vri^^ and instead of vr^ and Vn- 

Until now, we have considered a finite state-space by fixing a maximum number A of 
proteins. It is always easier to deal with finite Markov chains, but the main reason is 
because our algorithm to compute the invariant measure works in this case. Even if 
this model is realistic (an organism cannot contain an infinite number of proteins), it is 
interesting to show that the steady-state does not depend asymptotically on this maximum 
number A of proteins. In other words, we want to show that under a sufficient condition, 
the invariant measure converges in A. We define vr^^) = (7rl'^^)o<n<A 

and consider the 

familiy of probalility measures n = (7r(^))AeN embedded in N x {0, 1}. 

Theorem 2 The sequence of invariant measures vr^^^ converges weakly as A ^ oo to the 
invariant distribution of the process defined on the unbounded strip. 



Proof: A sequence {Pn} of probability distributions on a countable and discrete state 
space E converges to the probability distribution P on ii^ if and only if each of its sub- 
sequences {Pn'} contains a further subsequence {Pn"} that converges to P. A family 11 
of probability distributions on E is called relatively compact if every sequence of elements 
of n contains a convergent subsequence (to a probability distribution on E), and tight if 
for every positive e there exists a compact set K such that P{K) > 1 — e for all P in 11. 
Tightness implies relative compactness, see e.g. j4]. 

We first show that the family of probalility distributions 11 = ('7r(^))AgN is tight. Lemma 
[T] implies that there exists k not depending on A such that 



Since Va > 1, we have also 



II (A) II ^^n J- 

Hence, for all e > there exists M,. (not depending on A) such that 

OO CO ^ 

II J II - jl /jj 

j=M, j=M, •' 

Consequently, H is relatively compact and there exists a convergent subsequence (vr^^''^) 



km 



of H. Define 7r(°°) = lim^^oo ^r^^*^^. We check now that 7r(°°) is the invariant distribution 
of the continuous-time process defined on the unbounded strip. For each n G N we have 

= hm 4^*=) 

fe— >oo 

= hm (vri^\) Dn+i + 4^*=) Rn + U) 

fe— >oo 

— T^n+l ^n+l + T^n ^ ^• 

This shows that 7r(~) is indeed an invariant distribution of the limit chain. In fact, what 
preceded is also valid for any converging subsequence. Besides, the invariant distribution 
is unique because the process is irreducible. Thus, we can conclude that vr^"^) converges as 
A ^ OO to 7r*^°°^ which is the invariant distribution of the process defined on the unbounded 
strip. □ 



2.2 The method of generating functions for the mean and variance 



We consider the problem of computing the mean and variance of the gene product N{t) at 
steady-state, that is when t is large, using generating functions. As discussed in Section [2711 
generating functions allows in some simple cases to compute the steady-state distribution, 
see e. g. [31] or [21] . with simple feedback functions. Here we show that even when the 
feedbacks are arbitrary, the method can be used to gain insight in the relations between 
variance, mean and probability to be ON. To avoid boundary conditions, we suppose here 
that the number of protein is arbitrary (A = oo), and the only asumption concerning the 
propensity functions is that 1/(12) = v n, i. e. degradation is directly proportional to the 
number of proteins, while K{n) and g{n) are arbitrary positive functions and ^0 is not 
necessarily 0. 

Let 7r„(y) = lim P{N{t) = n,Y{t) = y) and 



t— >oo 



a(z) = ^7r„(l)z", /3(z) = ^7r„(0K 

be the partial generating functions related to the steady-state, and 

R{z) := Y.M'^Xn) - 7r„(0)5(n))z". 

From the master equation (12. ip at equilibrium, we deduce 



= -R{z)+fii{z-l)a{z) + u{l-z)^^, and = +R{z)+Mz-'^)l3{z)+H^-z)'^^^^^ 



dz dz 
Adding these two relations and assuming that z 7^ 1 gives 

da{z) ^ d/3{z) _ fiia{z) + iJio(3{z) 



dz dz V 

This shows that the following general relations hold: 

E(iV(oo)) = ^P(y(oo) = 1) + ^P(y(oo) = 0), 

Var(iV(oo)) = ^^l.^i + + E(Ar(oc))(l - E(Ar(oc))), 
where we recall that 

a(l) = P(y(oo) = 1), ^|.=i = E(iV(oo)y((X))) and ^U=i = E(iV(oo)(l-yM)). 

dz dz 

These formulas make sense since, when the promoter is on (resp. off), the process evolves 
as a birth and death process with birth rate [i\ (resp. /io) and death rate i^n, and has 
a Poisson distribution of parameter [ixjv (resp. Ho/v) as a stationary distribution, of 
mean and the variance given by iii/v (resp. Hq/v). The last term is related to promoter 
fluctuations, see the following Example. 

Example 1 Assume that g{n) = g and n{n) = k. Let F[t) = ^n:>QnP{N{t) = n,Y(t) = 
1). The master equation yields 

^ = gEit) + ^iG(t) - Fit)ig + k + z.), 

^ = /iiG(t) + //o(l - G{t)) - uE{t), 
dt 



where E{t) = E(iV(t)). Using the differential relation dG{t)/dt = g(l — G{t)) — kG, one 
gets that 

G{oo) - ^ 



g + K 

Notice that G{oo) is related to the stationary law of the 0/1 Markov chain given by the 
transition rates p{0,l) = g and p{l,0) = k, which models the fluctuations of the state of 
the promoter. Then 

Var(iV(cx))) = —^l=i + ^-^l=i + E{^)-E{^f 
= ^F{oo) + ^H{oo) + E{oo)-E{oof, 

where we set H{t) = E{t) — F{t). It follows that 

Var(iV(oo)) = ^G(oo) + ^(1 - G(oo)) + ^F(oo) + ^H{oo) - E{oof. 

V V V V 

We finally obtain, after some algebra, 

Var(Ar(oo)) = ^G(oo) + ^(1 - G(oo)) + ~ ^°^' var(y(oo)), 

V V Ti + T2 V 

where the characteristic times ti and T2 are defined by 

1 , 1 

Ti = — and T2 



The interpretation of this formula is obtained by observing that, when the promoter is 
on with probability G{oo), the process evolves as a birth and death process with steady 
state distribution given by a Poisson distribution of parameter [ixjv. The interpretation 
of the second term is similar. The third term corresponds to the variance of a Bernoulli 
random variable (on/off) multiplied by a factor accounting for characteristic times related 
to protein degradation and promoter fluctuation. When fiQ = 0, the coefficient of variation 
can be then given as 

2 ^ Var(iV(c^)) ^ 1 T2 Var(y(oo)) 

^ E(iV(oo))2 E(iV(oo)) TI + T2 E(y(oo))2 ' 

as given in ]30^ . The above relations yield moreover that 

pg g{g + v + 

where p = p^i/v, and it follows that CV^ is decreasing as a function of g and increasing 
as a function of k. 



3 Mean-field models 

Most mathematical works on gene networks, like [13j , consider networks with linear transi- 
tion rates in which the state space of each chemical species equals N. Results on networks 
involving catalytic transitions rates are very scarce. [24], also focus on such models but 
allow time dependent transition rates. In this situation, one gets interesting linear differ- 
ential equations for the first and second moments, and for covariance functions. When 
some state space is finite, boundary effects transform the equations which become more 
involved. 



The model for the self-regulated gene defined in Section [2] is similar to a mathe- 
matical model for an epidemic of schistosomiasis provided by [25] and [26]. In their 
model, the authors consider a similar Markov chain, where they replace every external 
random variables in the transitions probabilities by functions of their expected values. 
This means for example that the transition rate P{{n,y), {n + l,y)) = y^i is replaced by 
P((n, y), (n+1, y)) = E(y(t))/x, and P((n, y), (n, l-y)) by K{¥.{N{t)))y+g{¥.{N{t))){l-y), 
since for this last transition, the external random variable coresponding to this transition 
is N{t). One gets a time-nonhomogeneous Markov chain. One can show that the pair 
(E(A^(i)),E(y(t))) converges to a limit (E(A^(oo)), E(y(cx)))) (see e.g. I25j). This model 
is then asymptotically equivalent to the model of the self-regulated gene given in Example 
[T] where = = E(y(oo))/i: it is easy to check that the stationary distribution of 
N{oo) is Poisson of parameter /ii/z^ when 1/(12) = vn. The mean and the variance are 
then equal to ^i/v. The behavior of the propagation of noise in gene networks can be 
counter-intuitive, as shown for example by [32], where the mean gene expression at steady 
state is increasing as function of some inducer, but where the variance exhibits a peak. 
The same phenomenon occurs with the therapeutic network of Section |5l This shows that 
this model can't predict this qualitative behavior. We shall adopt a different point of view 
below by conserving the external variable Y{t) and taking only the average of N(t). 

Models in which one considers the average of N(t) in transition rates, but not involving 
promoters and therefore Y{t), have been considered more recently in the gene regulation 
setting by [17], where the authors introduce biologically meaningful delays in feedback 
interactions. They replace occurences in transition rates of n{N{t)) and g{N{t)) by ex- 
pressions involving their expected values, that is by K(K(N{t — 6))) and g(E(N{t — 6))) for 
some delay 9. Time delays are biologically very meaningful since, in fact, proteins move 
around at random and the delay 6 might represent the average time a protein takes to move 
back in the neighborhood of the promoter. As stated in the Introduction, their models 
however do not involve Y{t) € {0, 1}, and therefore promoters. For the self-regulated gene, 
assuming linear degradation transition rates of the form ^{n) = vn, the limiting steady 
state is again Poisson, so that this model is not completely satisfactory for predicting the 
propagation of noise in gene expression levels. 

In a previous work, [11], we proposed a mean-field model, which includes promoter 
states and time delays, extending a model of [5]. The regulatory network was studied in 
living cells, and the experimental data were in good agreement with the model's predic- 
tions. We also provided a rationale for introducing mean- field interactions: The many 
steps and relatively slow transitions between states of chromatin in mammalian cells be- 
tween the permissive and the non-permissive states of chromatin may dampen the noise 
that stems from the stochastic binding of a low number of activator proteins to the pro- 
moter and from noise amplification resulting from the gene auto-activation feedback. In 
this setting chromatin may act as a noise-filtering device that allows graded response from 
stochastic events. This new Markov chain r]{t) evolves in the same state space, but has 
transition rates given by (we assume that /xq = and ^1 = > 0) 

q{N{t + h) =n+l,Y{t + h) = y\N{t) = n, Y{t) =y)=y ijh + o{h), 

q{N{t + h) = n,Y{t + h) = l-y\N{t) = n,Y{t) = y) = K{E{N{t-e)) h + o{h) when y = 1, 
q{N{t+h) = n,Y{t+h) = l-y\N{t) = n,Y{t) = y) = g{E{N{t-e))) h+o{h) when y = 0, 
and 

q{N{t + h)=n-l,Y{t + h)= y\N{t) = n, Y{t) = y) = 



V n h + o{h). 



The main difference with the basic model is that transition rates hke p{{n, y), (n, 1 — y)) = 
K{n) are replaced by time non-homogeneous rates gt((n, y), (n, l — y)) = k{K{N [t — 9))) , so 
that the related Markov chain is time non- homogeneous. Let us denote by Qt the related 
transition matrix at time t, of instantaneous steady state distribution vr*, with n^'Qi = 0. 
In what follows, we shall use the family of transition matrices Q(b,c) given by 

q(b,c){{n,y), {n + i,y)) = y q{b,c){{n,y), {n - i,y)) = v n, 

Q(b,c){{n,y), (n, 1-y)) =b when y = 1, and q(b,c){{n,y), (n, 1 - y)) = c when y = 0, 

of steady state distribution vr^^''^^ Then Qt = Q(b{t) c{t))i where b{t) = K{K{N{t — 6))) and 
c{t)=g{E{N{t-e))). 

When dealing with time non-homogeneous Markov chains, the main problem is that 
the law of the stochastic process P{rj{t) = (n, y)) does not necessarily converges toward the 
limiting steady state distribution (when it exists) limf^ooTi"*, and can lead to oscillations, 
as provided for example in [5j, or in [T7]. The first thing we can do is to check the 
asymptotic behavior of the functions b{t) and c{t). Suppose that these functions converge 
toward positive numbers 6(oo) and c(oo). Then one ask if the following holds 

lim P{ri{t) = (n, y)) = lim 7r*(n, y) = (n, y) ? (3.1) 

t— >oo t— >oo 

Assume that this is true: Then one gets that the steady state behavior of the self regulated 
gene is given by computing the steady state and the basic statistical descriptors related 
to the Markov chain of transition kernel Q(6(oo),c(oo))) which is much simpler. We will see 
that in such a situation, one can get exact formulas for the mean and for the variance of 
the number of proteins (see also Example [T]). 

()3.ip holds under fairly general assumptions. Theorem [6] of the Appendix gives that 

lim P(r/(t) = (n,y)) = T^'^^^°^^^<°°^\n,y), 

t—i-OO 

when the limiting process of transition kernel Q(b{oo),c{oo)) is ergodic. 



i^/Ht) - ^Jh{oo)fdt < +00, and / { ^/ci£j - ^/c(oojfdt < +oo, (3.2) 
Jo 

and under an additional hypothesis which is automatically satisfied in our model (see the 
Appendix). 

The chemical master equation yields differential equations for G{t) = P{Y(t) = 1) and 
E{t) :=E{N{t)), given by 

dE 

— =^G{t)-vE{t\ 

dC 

— = c{t){l-G{t))-b{t)G{t). (3.3) 

Remark 1 f^, and \17^ , consider delayed differential systems similar to the system given 
by 'J\) with solutions oscillating toward limit cycles. The steady state exists however only 
when the solutions of this system converge as t ^ oo. We will see that this is the case for 
a linear positive feedback. 

ALGORITHM II 

(STEP 1): Check the convergence of the orbits of the system given by equations (|3.3p . 
for a given initial condition G(0), E{t), —O^t^O. When convergence holds, proceed to 
the next step 



(STEP 2): Let Coo = -E'(oo), with 
Coo = /i/z^G(cx)). Solve the equation 

5t(eoo)(l - -Coo) - K(eoo)-eoo = 0. 

(STEP 3): Let 

T2 ■= {g{eoo) + K(eoo))""^. 
Compute the coefficient of variation as 



2 ^ Var(jV(oo)) ^ J_ (1 - G(cx))) 

^ E(iV(oo))2 eoo ri + r2 G(oo) ^ 



where ti = 1/v. 

For more insight in these formulas, see the remarks in Example [H 



3.1 Convergence for linear positive feedbacks 



Recall that c{t) = g(E(N{t — 6)) and assume that K{n) = k. We generalize the linear case 
by assuming that g{x)/x is decreasing. Notice that even when g{x) is affine in n, this does 
not mean that the stochastic system is linear: for example, assuming fast promoters or 
a quasi-equilibrium, one gets a time nonhomogeneous birth and death process with birth 
rate fic{t)/{c{t) + k) and death rate I'n. We prove below that the above dynamical system 
is such that there is a globally asymptotically stable critical point {E{oo),G{oo)) with 

G{oo) > and c(oo) > 0, 

(see [12], for more general mathematical results). In this case, the mean and the variance 
of the number of proteins are obtained by studying the transition kernel Q(k,c{oo))- 
In the following, we focus on the system (j3.3p that reads in our setting 

— =^,G{t)-uE{i), 

— = g{m{t - e)m - Git)) - KG{t), (3.4) 

where g{-) is continuously differentiable and increasing over M_|_, g{0) > 0, is decreas- 
ing, the initial condition E{t) is continuous and non-negative over [—6, 0] and < G(0) < 1. 
We use the notation / for the derivative df /dt. We proceed step by step to show that 
condition (|3.2p holds. 

In Lemma[2l we prove that the evolution equations defining the system are well defined, 
providing a unique solution, then we show in Lemma [3] that the system converges to the 
unique biologically meaningful critical point of the system and in Lemma U] that the speed 
of convergence is exponential. The methods used are adapted from [12]. Finally, using 
our hypothesis on the function g, it is easy to conclude that the condition (j3.2p holds, and 
the main result is stated in Theorem [3l 

The theory of delayed differential equations is very different from the usual theory of 
differential equations, here the initial condition is no more a point in the finite dimensional 
space M+ x [0, 1] but a continuous nonegative function E{t) over the interval [—9, 0] and 
a value G(0) G [0,1]. To solve the system (j3.4p . we have to first integrate the second 
equation over the interval [0, 6], then plug the solution in the first equation and integrate 
using the variation of constant over the interval [0, 0] and begin the whole procedure anew 
over the interval [0,20] with initial condition given by E{t) over the interval [0,6], and so 
on. 



Lemma 2 Existence and unicity 

For any initial condition E{t) non-negative and continuous over [—0,0] andO < G{0) < 1, 
there exists a unique solution of the system ^c!.4\ ) defined over [0, +oo). Furthermore, 

< E{t) < max{E{0),fi/iy}, t > 0, 

and 

< G{t) < 1, t > 0. 

Proof: For any initial condition < G(0) < 1 and E{t) non-negative and continuous over 
[—9, 0], (|3.4p admits obviously a unique solution over [0, 9]. If G(0) > 0, then by continuity 
G remains strictly positive over some open intervall to the right of 0. If G{0) = 0, then 
according to the second equation of (j3.4p . G{0) > and the same conclusion holds. The 
same reasoning shows that G < 1 over some open intervall to the right of 0. Clearly, if they 
exist, to = inf{t € {0,9], G(t) = 0} and ti = mf{t G (0,6*], G{t) = 1} are both strictly 
positive. By definition, G{to) < and by continuity, G(to) = 0. The second equation of 
()3.4p entails G(to) > 0. We have a similar contradiction for ti, thus < G{t) < 1 over 
{0,9]. The variation of constant formula entails 

< E{t) < max{£;(0),/i/i/} over (0,6*]. 

Iterating the procedure provides existence and unicity of a solution defined over [0, +00) 
and the preceding inequalities are preserved. □ 

We are interested in the possible equilibria of (j3.4p in IR+ x [0, 1], i.e. the solutions {Eq, Gq) 
in M+ X [0, 1] of 

= /xGo - uEo, = g{Eo){l - Go) - kGq. 

Clearly Go = and Gq = 1 lead to contradictions. We thus have < Go < 1 and 
consequently £"0 > 0. Plugging Gq = j^Eq in the second equation yields 

— -^(- - Eo) - K. 

If g{0) > and is decreasing over (0,-|-cx3), then ^^(^ — E) is strictly decreasing. 
Since it starts at +00 and becomes ultimately negative, we conclude to the existence of a 
unique solution {Eq, Gq) G x [0, 1]. 

We will use the fiuctuation Lemma [6] given in the Appendix to prove the convergence to 
the critical point {Eq,Gq). 

Lemma 3 Convergence 

For any initial condition E{t) non-negative and continuous over [—9,0] and < G(0) < 1, 
the unique solution {E{t),G{t)) converges to {Eq,Go) as t ^ 00. 

Proof: The fluctuation Lemma [6] and the monotonicity of g imply that 

0>fiG-uE, > g{E){l - G) - kG 

0<fiG-uE, < g{E){l - G) - kG. (3.5) 

We prove the last inequality to exemplify the method. We choose tn T +c« so that 
G{tn) G and G{tn) ^ as n — > +00. Since the sequence {E{tn)}n>i is bounded, 
there exists a subsequence {tn^)k>i so that E{tn^) converges as /c ^ 00 to a certain value 
that we call i?oo- Evaluating the equation for G over the subsequence {tn,,)k>i and letting 
k —>■ +00, we get 

= g{E^){l -G)-kG< g{E){l -G)-kG 



since g is increasing and G < 1. The proof of the other inequahties in (j3.5p is similar. 
We already know that < G, ^ > and G < 1, and ([33]) entails G < and G > ^E, 

and in particular E < < ^. Consequently 

—E<^G< g{E){l - G) < g{E){l - -E), 



hence 



n<'4B{t^-E). 
~ E ' 



Repeating the same argument for E_, one gets 



K > 



E V 



By assumption, ^^(^ — E) is decreasing, so that the two last equations then give that 
E < Eq and E > Eq. Clearly we have E_ = E = Eq, so that E{t) converges as t — s- +oo. 
According to Lemma 3.1 in [8], E(t) and its first two derivatives being bounded on [0, +oo), 
we have lim E{t) = and the relation E = jjiG — lyE entail the convergence of G{t) as 

From this Lemma, we deduce that the propensity function 

a^{t,y)=g{nN{t-e))) y 

converges to ^oo V = g{Eo) y ast oo. To show that the convergence speed is exponential, 
we use Theorem [7] cited in the Appendix. 

Lemma 4 Exponential convergence 

The convergence of {E{t),G{t)) to (Eq^Gq) is exponential. 

Proof: Near a critical point, the asymptotic behaviour of the system is determined by 
the asymptotic behaviour of the linearized system 



m 

G{t) 

where A and B are the matrices 



A- 



m 

G{t) 







-{9{Eo) 



+ B 



B 



E{t 
G{t 





g'{E^){l-G^) 



We show that all roots A of the caracteristic equation det(A + e B — XI) = have 
negative real parts. The caracteristic equation is here 



A2 + X{i, + g{Eo) + k) + i^igiEo) + k) - /U5'(^o)(l - Go)e 



0, 



and all roots A of this equation have negative real part if and only if all roots z of 

H{z) := {z^ +pz + g)e^ + r = 
have negative real parts, with 

p:={v + g{Eo)+K)e, q := v{g{E^) + K)e\ r := -f,g'{Eo){l - Go)e\ 



and the change of variable z := X9. According to Theorem [7] given in the Appendix, since 
r < and 



P 



2q + ly^e^ + {g{Eo) + K^e'' > 2q, 



2q2 



we have to check that —q < r < and r sm.{a2) / {pa2) < li where a2 is the unique 
root of the equation cot(a) = (a^ — p)/q. which hes in the interval (27r,37r). The second 
inequahty is clear since r/p < and sm(x)/x > on (27r,37r). For the inequality — r < q, 
notice that g'{x) < g{x)/x since g{x)/x is decreasing, and using the equilibrium equation 
g{E^){l - Go) = kGo = fEo, we have 



-r = fig'iEo){l - Go)e^ < ^^(1 - Go)9'' = ukO^ < u{k + g{Eo))e^ = q. 

Hence all roots of the characteristic equation have negative real parts and the system 
is asymptotically stable. Since our system is autonomous, asymptotic stability implies 
uniform asymptotic stability. According to theorem 4.6 in [19], the convergence is expo- 
nential. □ 

Using the exponential convergence of E,{N{t — 9)) and the hypothesis on g, it is now easy 
to show that condition (13.21) is satisfied. 



Theorem 3 Assume that g{n) = go + gin with go > and that K{n) = k. The limiting 
distribution of the time-nonhomogeneous process {N{t),Y{t)) is such that 

hm P{N{t) = n,Y{t) = y) = T^t'^^^'^Xv) , 

where 7r^9ieac),i-^) j^g steady state distribution given by Theorem \^ for a self regulated 
gene with the simpler transitions 

P{{n, y), (n + 1, y)) = y fi, P{{n, y), (n - 1, y)) = vn, 

P{{n, y), (n, I - y)) = k when y = I, P{{n, y), (n, I - y)) = g{Eo) when y = 0. 
The steady state coefficient of variation ofN{oo) is given by 

^ Eo^ n + T2g{Eo)' 

where 

1 1 

g{Eo) + K V 

Proof: According to Theorem[6]in the Appendix, we only have to show that condition (j3.2p 
holds. Using the positiveness and boundedness of g{E(t — 9)), < ^(O) < g{E{t — 9)) < 
g{^/u), and the expansion 



V9(B(< - 9)) + \/9(^' 

we have 



and condition (|3.2p is in our case equivalent to 



POO 

/ {giE{t-9))-g{Eo)fdt<^. 
Jo 



Let £ be positive, e < Eo and Tg be such that | E{t — 9) — Eq \< £ for all t > T^. Using the 
mean value theorem, for all t >T^, there exists a in the interval delimited by E{t — 9) 
and Eo such that 

I g{E{t - 9)) - g{Eo) \= g'{it) \E{t-9)-Eo\. 



Since for all t > T^, Eq — e < mm(E{t — 9), Eq), and furthermore < g'{x) < g{x)/x and 
g{x)/x is decreasing, 

I gm - 9)) - giEo) \ = g'{it) \ E{t -9)-Eq\<^^\ E{t -9)-Eo\ 

< ^^^^ I E{t -9)-Eo \=: Le \ E{t - 9) - Eo |, 
Eq - e 

and finally with the exponential convergence of E{t — 9) to Eq, condition (|3.2p holds 

/ {g{E{t - 9)) - g{EQ)fdt < / {g{E{t - 9)) - g{EQ)fdt 
Jo Jo 

POO 

+ lJ {E{t - ( 
JTf 



£'o)^dt < oo. 



□ 



Remark 2 When the positive feedback rate g{K(N{t — 9))) is such that is increas- 
ing, for example when g is a polynomial of degree > 2, there can possibly exist several 
biologically meaningful equilibrium points and it can not be excluded that for some initial 
conditions the solutions of equation ^3.4^ oscillate endlessly. 

When g is constant but the negative feedback K,{E.{N(t — 9))) is an increasing function of 
M{N{t — 9)), the biologically meaningful equilibrium point is unique but similar application 
of the fluctuation lemma as in the proof of Lemma yields the trivial observation that 
E_< Eq < E , and oscillating solutions can not be excluded in this case either. 

4 Two-time- scale stochastic simulations 

The self-regulated gene and the network presented in the Introduction involve slow and 
fast species, like therapeutic proteins and activator dimers. We recall existing known prob- 
abilistic results concerning quasi-equilibrium. Let e > be a small parameter, which will 
be useful for describing fast species. In what follows, r]^{t) is a random vector describing 
the number of molecules of each species present in the cell at time t. For example, consid- 
ering the self-regulated gene, N^{t) gives the number of protein molecules at time t, and 
Y^{t) = 0, 1 gives the state of the promoter. The pair r]l(t) = {N^(t),Y^(t)) stands for the 
slow process. r]j{t) models the fast process, and the global process is r]^{t) = (t), r/^ (t)). 

A generic example of fast reaction is dimerization, as given by the chemical reaction 

where Ai represents protein monomers and "D protein dimers. Protein dimers form a fast 
species, while protein (involved in dimers or monomers) is a slow species. The rates of 
these reactions are fast when for example the rate constants Pt and (3^ are such that 
/3i = c_/e and = c+/e, for positive constants c_ and c+, when e ^ 0. In this setting, 
the number of protein monomers is then given by A^^(t) — 2D^(t), where D^{t) gives the 
number of protein dimers present in the cell at time t. Here 

r^l{t) = {N^{t),Y^{t)) and i^^{t) = D^{t). 

When e ~ 0, a quasi-equilibrium is attained, meaning that for given = k, one can 

assume a local steady state for the number of dimers. More generally, we assume that 



the slow process evolves in some finite space Eg = {1, • • • ,L}, and that, given k G Eg, 
'r]j{t) E N is described by a Markov transition kernel A^{t)/e (see below). We follow 
essentially |33j . We assume that the generator Q{t) = Q^{t) of the Gillespie algorithm 
r/^(t) can be decomposed as 

Q-{t) = -^A{t)+B{t), 

where A{t) and B{t) are matrix valued functions. Following ^33j, assume that A{t) has 
the block diagonal form 

■•• \ 
■•• 

••• A^{t)] 

where each block A^{t)/e is a transition matrix representing the transitions rates of the 
fast variables given r/f = k. The generator B{t) gives the slow transition rates and in 
particular transitions of the form {(k,u), {j,v)), where n, E N and k,j E Eg- Following 
[33j . we partition the state space E as 

E = Ei\JE2U---UEl, 

where each E/^ contains elements, with 

Ek = {cfci, efc2, ■ • • , Cfcrnj.}, k E Eg. 

Each Ek corresponds to some subset of {k} x N, with k ^ Eg. 

Hypothesis: We suppose that the process is time homogeneous, that is that both A[t) 
and B{t) do not depend on t, and that each generator A^ is irreducible with a unique 
invariant probability measure = ((T'^(efci), • • • , o"^(efcmj.)), such that a^A'^ = 0. 

Following [33j, each E^ can be aggregated, and represented by a single state /c, corre- 
sponding to a particular slow state; The Markov process rii^{t) of transition kernel is 
then approximated by an aggregated process fi^{t) defined by 

f{t) = kifri'{t)eEk, k = l,--- ,L. 

This process converges in distribution as e ^ toward a Markov process rj{t) generated 
by the kernel Q = {'ykj)k,jeEs, with 

Ikj = ^'^cr''{eku)B{eku,ejy), ky^j. 

u=l v=l 

4.1 Transcription with fast dimerization 

The model is similar to that given in Section [H with dimerization as a fast component, 
see e.g. [6], [7], or [16]. It is described by the following set of chemical reactions: 

7W^0, ^^M, / E {0, 1}, Oo + V ^Oi, M + M^V, 

K.{d) 

where T> represent dimers, g is function of the number of dimers, and the rates /3i and 
(3'^ involve a small number e > modeling the speed of dimerization, see below. The 



A{t) 



(A\t) 
A^{t) 



\ 







number N^{t) of proteins V present at time t is related to the number of dimers as 
^ 2D^(t) ^ N^(t), and the number of free monomers is such that A4 + 2P = V. 

The running process is a Markov process rj^{t) = {N^(t),Y^{t),D^{t)), ^ 2D^{t) ^ 
N^{t), t ^ 0. The dimerization process D^{t) is given by the transition rates 

Pivtit+h) = iil{t),D'{t+h) = D'{t) + l) = P%{N%t)-2D%t)){N%t)-2D%t)-l)h+oih), 

Pivtit + h)= vlit), D%t + h) = D'{t) - 1) = ptD'{t)h + o(/i), 

where the slow process is = (A^^(t), 

The transition rates of D^{t) depend on N^{t) but are independent of the state of the 
promoter. We can fit the setup of this Section by setting 

/3i = ^ and (3% = ^, 

for positive constants c_ and cj^. Then, the result of Section [J] yield that the slow process 
at quasi-equilibrium (e ~ 0) is well described in the above discussion: For a given slow 
state k = {n, y) , one gets 

g{n)= J2 ^^''''Hd)g{d), a^^'y^ = fi^, 

0^d^[?i/2] 

where the quasi-equilibrium stationary measure cr*-"'^-' corresponds to the stationary mea- 
sure /i" of the dimerization process (see below). A typical example is given by g{D^(t)) = 
XD^{t) + g{0), that is depends linearly on the number of dimers at time t. Then, at 
quasi-equilibrium one gets 

g{n) = f/'{d){\d + 5(0)) = AE„ + ^(0), where we set E„ := ^ d/i"(d), 

where [■ ■ ■ ] denots the integer part. The algorithms developped in Section[2]can be applied 
efficiently if one can compute the rates g{n). The next Section develops efficient algorithms 
for computing g{n) when g{d) is linear or quadratic in the number d of dimers. 

4.2 Dimerization 

Dimerization appears in most biochemical processes, and is usually considered as a fast 
reaction. The aim of this Section is to give mathematical statements relevant for com- 
putational purposes (see also [9], [7], or [23]). Given a fixed number of proteins n, the 
dimerization process is given by the reaction 

where we recall that M and D represent protein monomers and dimers. The infinitesimal 
transitions probabilities are such that 

P{D{t + h)=i + l\D{t) = i) = c+{n- 2i){n -2i-l)h + o{h), 

P{D{t + h)=i- l\D{t) =i) = c^ih + o{h). 
The stationary distribution /x" of the process is given explicitely by 

= l—y-t • ^, ^ i ^ n2, n2 := [n/2]. 



where 



i=0 



1 

(n - 2i)\il ■ 2* 



"2 



It can be shown that the generating function at equihbrium M(s) = ^ fi"'{i)s^ can be 

i=0 

written using confluent hypergeometric functions 



M(s) 



— ) 



lFi{- 



■n2, 



1^1 1 -»^2, 







if n is odd, 



if n is even. 



lFl(-n2,i,-^) 

This gives a theoretical way of computing the invariant measure as 

mW(o) 



< i < n2, 



and the mean number of dimers in the stationary regime is given by 

En = M'(l). 

Numerical computation based on this last formula is tedious and in the case described in 
Section 15.31 we have to compute moments repeatedly for each n between and A. The 
recursive method described in the next section provides an alternative adapted to this 
situation. 



4.3 An approach of the invariant measure adapted to numerical com- 
putation 

We provide a different approach, which will allow efficient computations of the mean and 
second moment. If the feedback function in the slow process is linear or quadratic in the 
number of dimers, the infinitesimal transition rates of the slow process at quasi-equilibrium 

712 i 

will only depend on the first two moments. Set y = c+/c-, so that Zn = / -, ttt- 

i=o ^ ' 

Using the following polynomial identities : 

n2 . ,• n2 -2 j 

Ely ^ I y 2 

^^^-^ = ,Z„_2 and ^j;^^^^_=y Zn-, + yZn.,, 

i=l i=l 

the mean := J2o<i<n2 ^/^"(^) ^^'^ second moment E,^ := ^o<j<n2 given by 

w \ ^ ly' Zn-2 T ^ 

= z;: g (iv^2i)!i! = ^ ^ + = ^"(^ + ^"-)- 

In what follows, we give another description of Zn based on the involutions of the permu- 
tation group Sn- This approach will allow to compute the ratios Zn-2/Zn recursively. 



4.3.1 A description of Z„ based on the involutions of Sn 

Let In denote the involution subgroup of the permutation group Sn, i. e. the set of 
permutation of n points a £ Sn so that o"^ is the identity. For a G In, fix((T) denotes 
the number of fixed points of a. For any number n — 2i between and n, the number of 
involutions with n — 2i fixed points is given by 

^ ^~V2y'V 2 )"'\ 2 y ■ i! ~ (n-2i)!i! -2*' 

f[yi{a)=n~2i 

SO that, setting q := -sj c_/2c+ = (2^)""'^/^, Zn can be written as 

" ^ V c_ / (n - 2i)!i! • 2* n! ^ V c_ / n! ^ 

Let Qn denote the polynomial Qnio) ■= so that the partition function and the 

crG/„ 

mean (j4.ip can be written as 

= — rQn(g), and E„ = ^ . . • (4.2) 

n! 2 Qn(g) 

According to [28], one can identify Qn as the Taylor coefficient of a Stieltjes type continued 
fraction. Here we proceed in a recursive way, using the following two propositions. 

Proposition 2 Qni<l) satisfies the relation Qn+i{<l) = (iQuio) + Qn^o)- 

Proof Each involution a £ In induces l + fix((T) involutions in In+i, namely the one that 
fixes the point n + 1 and the fix((T) ones that interchange a fixed point of a with n + 1. 
Partitioning In+i as the set of involutions that fixe n + 1, and those that do not, we see 
that the first set contains exactly the involutions of /„ except that they have one more 
fixed point, namely n + 1. For each a £ In that fixes at least one point, the second set 
contains fix(c7) involutions with one fixed point less, namely the one that is interchanged 
with n + 1. More precisely, the partition of In+i is given by 

In+i = {o" G In+i', o" fixes n + 1} U {(J S In+i', c" does not fix n + 1} 
= /nU IJ y {ao{k,n + l)} 

crdln cr fixes k 

where {k,n + 1) is the permutation of k and n + 1. Therefore, we have the recurrence 
relation 

Qn+i(g)= E g^-^"^) = E '^'^^'^^^^ + E E '^'^^'^^"^ 

cG-fn+l ce/n cG/n, l<A;<fix((T) 

fix((7)>l 

= ^^gfixM+ fix(a)gfi-M-i = <7g„(g) + Q;(g). 

fix(CT)>l 



□ 



Proposition 3 The derivative of Qn is given by Q'niq) = n ■ Qn-i{<l), (ind hence 



Qn+i{q) = qQn{q) + n ■ Qn-i{q)- 



(4.3) 



Proof One can easily compute Qi{q) = q and Q2{q) = + 1. If Q'n{q) = nQn-i{q) for 
some n, using Proposition [2] for the first and last equality and by the induction hypothesis 
for the second one, we have 

Q'n+iiq) = Qn{q) + qQ'nil) + Qnil) = Qn{q) + qnQn-i{q) + nQ'n_i{q) = {n + l)Qn{q)- 

□ 

Remark 3 Let hq{t) := YlT=oQ'ri{q)^- Multiplying both sides of (f^.3p by — and sum- 
ming over all possible n leads to the equation h'g{t) = (t + q)hq{t), with initial condition 
hqifi) = QQ{q) = 1, which has the unique solution 

Kit) = 6^+"*. 

This function is the moment generating function of a normal random variable of mean q 
and variance 1, so that Qn{q) is the n-th moment of a random variable X ~ AA(g, 1). 

ALGORITHM III 

Due to the fast increase of its coefficients, Qn cannot be efficiently computed for large 
n. However, the computation of the mean E„ only involves the ratio Qn-2/Qn- 

Let Cn{q) ■■= %^rT^- Fiom = (^n-iiq) ■ Cn{q) = ^ , (l -qCn{q)), one obtains 

the continued fraction 

-^ = q + ncn{q). (4.4) 

From Qo{q) = 1 and Qi{q) = q, the first term ci is given by ci{q) = 1/q. 

The mean and the second moment (|4.2p or (j4.ip can then be computed recursively as 

ft 

E2 = ^(1 -qcniq)) (l + ^(1 -qcn^2{q)). 
Theorem 4 c„(a;) ^0 as n ^ oo. 

Proof: Suppose that the limsup of the sequence of non-negative numbers {c„(x)}„>i is 
strictly positive, 

limsupcn(x) = a > 0. 

n—*oo 

Using relation ()4.4p . one gets 

1 1 



a = limsup Cn{x) 



x + liminf nc„(x) x + b^ 

n— >oo 

where b := liminf ncn{x) has to be finite. Isolating c„(x) in ()4.4p yields 

n— >oo 

( \ - 1 _ ^ 

ncn^\{x) n 

and we get 

a = limsupc„(x) = limsup 



ncn+i{x) nj liminf nc„+i(x) • ^ b' 



1 1 

Since x > and b < oo, this leads to the contradiction = — . □ 

x + b b 

The above Theorem leave to the somehow counterintuitive conclusion that the fraction 
of dimers is about i for n large, more precisely 



iim — 

n— ►oo fi 



2' 



for every set of positive parameters c+, c_. 

In our concrete Example of Section I4.H we are mainly interested in computing higher 
moments for n proteins. We will show below that the computation of higher moments 
only requires the knowledge of the first moments for a lower number of proteins. More 
precisely, let denote the polynomial 

j 

P,+i(i) := i • (i - 1) • • • (i - 2) • • • (i - i) =: - ai,/- 

1=1 

With the convention that Ej = for i < 0, the higher moments can be computed as 
combinations of the means for lower total number of proteins. 

Lemma 5 E„(P,-+i(i:>)) = En-2j ■ E„_2(j_i) • • • E„_2 • En- 
Proof: We show that both terms are equal to y-' — . 

Zn 

Pj+W' _^i-(i-l)--(i-2)'--(i-j)s' 



Z„.E„(P,,.(D))=J:-2^ = E- 



^(n-2i)\i\ ^ {n-2i)\i\ 

V- . 

(n-2i)!(i-j-l)! 
■ ^Z^^ - 2{z - j - 1) - 2(j + 1))! (i-j- 1)! 

n2-(j+l) 

^ E 



.=0 .ri-2{j + l)-2i)\i\ 



and with (14.21). we have 



7 + 1 ry 

y ^n-2(i+l)) 



E„_2, • E„_2(,_i) ■ ■ ■ E„_2 • =y %^ • y ---y^ 

^n-2j ^n-2(j-l) 

Prom the preceding Lemma, we can give a formula for arbitrary moments: 
Theorem 5 The j + 1-th moment of D is given by 

j 

1=1 



Proof: From the definition of the coefficients aij, 

j 

1=1 

hence with Lemma [5] the statement holds. 

5 Modeling the regulatory gene network 

We first recall the basic mathematical steps which lead to the mathematical model studied 
in [TT]. We shall see that the time evolution of the gene products involved in the network 
described in the Introduction can be modeled by the following set of chemical reactions: 

A^ta, 0^^, ^ ^, = ^1^1 = 0, 1, o^ + A^ ot 

where the symbol A stands for activator proteins, and Of, / = 0, 1 denotes the state of 
the promoter related to the activator, and by the chemical reactions related to therapeutic 
proteins as given by 

Ol + A^ Of, X^$, %^X, Jli =Jll,l = 0, 1, 

K 

where A denotes activator proteins, and Of is defined in a similar way for the promoter 
of the therapeutic gene and X symbolizes therapeutic proteins. 



5.1 Equilibrium equations 

The modeling of the time evolution of the number of molecules involved in the regulatory 
network is obtained by assuming that extrinsic noise, here the random fluctuations of the 
number of repressor and doxycycline molecules attains a chemical equilibrium. We first 
describe mathematically the effect of this extrinsic noise on the promoters associated to 
the activator and therapeutic genes. We follow Section 28 of Consider a multiple 
binding of a ligand X with 1 ^ i ^ k different binding sites on a polymer P, 

P + iX — > PXi, i = l---k, 

with equilibrium constants 



The binding polynomial is defined by 



[P][XY 



1=1 

where in the sequel X will denote the number of ligand molecules. The proportion of P 

\PX-] 

molecules that are in the i-th liganded state is ' , and the average number of bound 
sites is 

dln(Q) _ T!:=oiKa' 
- dh^ - Q{X) ■ 



Example 2 // the k binding sites are independent, there is no cooperativity, and one has 

Q[X) = {l + KX)\ 

with 

M(X) """^ 



l + KX 



Example 3 If a P molecule binds to exactly k ligands molecules at a time, one gets the 
Hill model kX + P — > PX/^, with equilibrium constant K , and 

Q{X) = 1 + KX\ 

kKX'' 



M{X) 



1 + KXi" ' 



5.2 Transgene expression 

Reaction of TetR repressor and doxycycline 

The reaction between the doxycychne (Dox) and the repressor (R) is described as R + 
Dox — > RD with some forward rate, and RD — R + D with some backward rate; 
Considering equilibrium of constant Krd, we can write Kjid = [RD]/{[R][Dox]), where 
[Dox] gives the number of molecules of doxycycline. The free proportion of (R) molecules, 
i.e. not involved in the RD complex, can, when considered as ligand, bind to the kr sites 
of the TetR operators (the binding sites where repressor molecules can bind, see e.g. p2]). 
inhibating thus both the transactivator and the synthesis of the transgene product. We 
next estimate the average fraction F{R, [Dox]) of sites free of repressor. Let kr denote the 
number of sites where repressors molecules can bind. Using a Hill model of cooperativity 
(see Example El, one gets that the average number of bound sites is then given by 



l + Kr[R] 

Then, 

F(R iDox]) = '^r-MriiR]) ^ I 

The total number of repressor, denoted by Rtot, is such that 

[Rtot] ~ [R] + Krd[R][Dox]) = [R]{1 + Krd[Dox]), 

when we neglect the amount of repressor involved in the kj- binding sites. Set [Rtot] = Rmax- 
Then 

[1 + Krd[Dox])''^ 



F{R, [Dox]) 



[l+KRD[Dox])^r +KrR 



max 



Transactivator 



The transactivator is repressed by the bound repressors, and activated by the positive 
feedback loop; The above considerations suggest a stochastic model of transactivation with 

g[d)=VF{R, [Z)ox]K+5(0), 

where a denotes the number of binding sites on the activator, d denotes the number of 
transactivator dimers, and where V is a parameter. (7(0) > is introduced here to model 
basal activity for the off to on transitions. 



5.3 The regulatory network at quasi-equilibrium 



We assume that the promoter switch from the off to on state at rate 

g{D'{t)) = VF{R, [Dox]){D'{t)r + <?(0), 

where D^{t) is the number of transactivator dimers present at time t, a is the number 
of activator binding sites and V is a parameter. This models the positive feedback loop. 
We suppose that degradation occurs at a rate proportional to the number of monomers 
N^(t) — 2D^{t), with constant of proportion v. The transactivator process is given by the 
triplet {N^{t), l^'^(t), -D^(t)), where we assume fast dimerization, as given in the preceeding 
paragraph. The time evolution of the network is described by the random process 

V'it) = {N^it),Y%t),D^{t),X^{t),Z^it)), 

where X'^{t) denotes the number of therapeutic proteins {X) present in the cell at time 
t, and where Z'^{t) = 0,1 denotes the state of its associated promoter (off/on). These 
chemical reactions are described schematically as 

o^ + v!^ of, ^^0, 1 = 0,1, 

K 

where Of , I = 0,1 accounts for the state of the promoter related to the therapeutic 
gene, ^A denotes activator proteins (monomers), /ii = fil, I = 0,1, X denotes therapeutic 
proteins and Jli = fll, I = 0,1. We again assume a quasi-equilibrium with fast dimerization, 
to get the limiting process 

7]{t) = {N{t),Y{t),X{t),Zit)), 
associated with the set of coupled chemical reactions 

A^ta, taJ^A, 1 = 0,1, o^ + Ai^ot 

K 

Of + A^ Ol, X^%, %^X, 1 = 0,1, 

K 

with quasi-equilibrium transition rates given by (see Section H]) 

v{n) = z^E^n(n — 2d), 

g{n) = VF{R, [Dox])W.^u{(f) + g{0), 

h{n) = VF{R, [Dox\)¥.^,n{(f) + h{0). 

5.4 A semi-stochastic mean field model 

We consider the time evolution of the network in a semi-stochastic version by supposing 
that the rates g{N{t)) and h{N{t)) are replaced by c{t) = giE{N{t - 9))) and c{t) = 
h(K(N{t — 9))). The method is similar to what is presented in Section [31 Consider the 
family of transition kernels L^^j = (<?cc(("') x, z){n' , y' , x' , z')) given by 

Qcc{{n,y,x,z){n + l,y,x,z)) = yfi, qc^{{n,y,x,z){n - l,y,x,z)) = u n, 



1 

20 40 60 80 100 

Dox 



Figure 4: Assay of the regulation of EGFP expression in living cells. The experimental 
curve obtained in [11], (in black) giving the average expression of therapeutic proteins as 
function of the number [Dox] of doxycycline molecules is compared to the curves obtained 
from the mean-field model, where the green and blue curves provide the mean expression 
levels of activator and therapeutic proteins, respectively. 



qcc{{n, y, x, z){n, 1 -y,x, z)) = Ky + {1 - y)c, qcc{{n, y, x, z){n, y,x + l, z)) = zjl, 

qcc{{n,y,x,z){n,y,x - 1, z)) = vx, qcci{n,y,x,z){n,y,x,l - z)) = kz + {1- z)c. 

The nice feature of this kernel is that its steady state distribution is the product tt'^ 
of the stationary distributions associated with the self regulated genes given by the two 
sets of chemical reactions 

^^0, ^^A, 1 = 0,1, Oo^Oi, 

K 

X^$, {b^X, 1 = 0,1, Oo^Oi, 

Both measures can be computed efficiently by using either the method of transfer matrices 
or the exact analytical expressions obtained through generating functions. Coming back 
to the time evolution of the network under a mean field model, the method is similar to 
that given in Section [3] and consists in two basic steps: 

• Find the limiting values c(oo) = lim^^oo c(t) and c(oo) = limt^ooc(t), when they 
exist, 

• compute the steady state distribution 7r''''^''°°-* (8> 7r'^''^*^°°-*, and the related means and 
variances. 

Let (iV(oo), y(cxD), X(oo), Z{oo)) be distributed according to the steady state distribution. 
Proceeding as in Example [H the coefficient of variation related to the activator satisfies 



2 _ Var(iV(oo)) _ 1 T2 Var(y(oo)) 

^~ E(iV(oo))2 ~ E(iV(oo)) ^ n + r2 E(y(oo))2 ' 



CV^ = ^;t7^7t^ = + 

where 



E(iV(oo)) = ^ ^"^"^^ , n = -, and - ^ 



U c{oo) -\- K I/' c(cX)) + K 

Similarly the CV of the transgene product is such that 

2 ^ Var(X(oo)) ^ 1 ?2 Var(Z(oo)) 

^ E(X(00))2 E(X(00)) TI+T2 E(Z(00))2 ' 



where 



E(X(oc)) = t '^^^ 



c(oo) + /? 



Ti = —, and T2 



V c(oo) + K 

In what follows, we consider c{po) and c(cx3). 
5.4.1 c(oo) and c(cx3) for linear feedbacks 

In the linear case^c(t) = g{¥.(N {t - B))) = go + giE{N {t - 9)) and c(t) = h{E{N{t-9))) 
ho + hiE,{N{t — 9)). We thus consider the averages 

E{t) = E{N{t)), G{t) = E{Y{t)) and E{t) = E{X{t)), G{t) = E{Z{t)), 

which satisfy the system of delayed differential equations 

^ = (50 + 9iE{t - 9)){1 - G{t)) - KG{t), ^ = fiG{t) - uE{t), 



^ = {ho + hiE{t - e)){l - G{t)) - KG{t), ^ = flG{t) - VE{t). 

The results of Section [3.11 yield that the above delayed differential system has a globally 
asymptotically stable equilibrium point (i?(oo), G(oo), £'(00), (^(oo)), with 

E{oo) = ^G(oo), (^70 + 5i^G(oo))(l - G(oo)) = kG{oo), 
(ho + /ii^G(oo))(l - G(oo)) = kG(oo) and ^(00) = ^G(oo). 

Finally 

c(oo) =50 + 9iE{oo) and c(oo) = /iq + /ii-E(oo). 

6 Conclusion and discussion 

In this work, we considered a class of self-regulated genes which are the building blocks of 
most of the existing gene networks. We provided efficient numerical algorithms for com- 
puting the steady state distribution of the number of produced proteins. These results 
permit to handle more complex situations, and to understand the effect of positive or 
negative feedbacks in the network's dynamics. They might also be useful in reverse engi- 
neering problems when infering for example the parameters defining chemical reactions. 
Next, we considered in Sections 3 and 5 mean field models with time delays which are of 
special interest in synthetic biology or in biotechnology, where small engineered regulatory 
networks are inserted at random in host genomes. Mathematical results in this setting are 
very scarce, and it is known that such systems can exhibit oscillations (see e.g. [5] or |17j). 
Section 3 provides convergence results for mean field models with time delays, which might 
open ways for handling more complex gene networks. Experimental results performed in 
living cells were in good concordance with our predictions. This shows that such models 
can provide relevant informations concerning complex systems, and that mathematical 
models can be efficiently used for the design of new regulatory gene networks in synthetic 
biology or in biotechnology. 



7 Appendix 

7.1 Fluctuation Lemma 

The following result is a slight modification of Lemma 4.2 in |20j : 



Lemma 6 Let f : M+ — >■ M 6e bounded and differentiable, f denoting its derivative. There 
exist increasing sequences tn T +00 and Sn T +00 , such that 

f{tn) ^ 7, f{tn) ^ 0, and f{Sn) ^ /, ^ 

as n ^ +00, where for a function f we denote 

/ := limsup/(t), / := liminf /(t). 

7.2 Convergence of time-nonhomogeneous Markov Chains 

We consider a nonhomogeneous Markov chain X{t) taking values in N, of instantaneous 
transition matrix Qt = {qt{i, The following Theorem is proved in [T]. 



Theorem 6 Assume that we can find nonnegative constants q{i,j) such that 

POD 

y]9(i,i) < +00, / Watihi) - ^/q{i,j)fdt < +00, 
~T, Jo 



and 



I 



s{i,j)ds = I qs{i,j)ds. 



lO^si^t, g(i,i)>0 Jo 

Let Qo = {q{i, j))i,jeN, o,nd let X^{t) be the related N-valued Markov chain. Suppose that 
Qo is ergodic, that is that there is a unique probability measure vr such that ttQq = and 

lim P(XO(t) = =i)= vr,, Vs, i, j. 

r— >oo 

Then 

lim P{X{t) = j\X{s) =i)= TTj, Vs, i, j. 

t — >oo 

7.3 Zeros of an exponential polynomial 

We consider the exponential polynomial H(z) = {z^ -\- pz + q)e^ + r, where p is real and 
positive, q is real and nonnegative, and r is real. The following Theorem is proved in [3], 
p. 449. 

Theorem 7 Denote by ak {k > 0) the sole root of the equation cot(a) = (a'^ — q)/p which 
lies on the interval {kir, kir + tt). We define the number w as follows: 

1. if r > and p^ > 2q, w = 1; 



2. if r > and p^ < 2q, w is the odd k for which au lies closest to \/ q— p^ /2; 

3. ifr<0 and p^ >2q,w = 2; 



4- if r < and p^ < 2q, w is the even k for which a^ lies closest to \J q— 

Then, a necessary and sufficient condition that all roots of H{z) = lie to the left of the 
imaginary axis is that 

1. r>0 and r sm{aw) / (pow) < 1 or 

2. —q<r<0 and r sm{aw) / (pow) < 1- 
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